External Sampling Publisher Accessed Terms of Use Detailed Terms External Sampling
نویسندگان
چکیده
We initiate the study of sublinear-time algorithms in the external memory model [14]. In this model, the data is stored in blocks of a certain size B, and the algorithm is charged a unit cost for each block access. This model is well-studied, since it reflects the computational issues occurring when the (massive) input is stored on a disk. Since each block access operates on B data elements in parallel, many problems have external memory algorithms whose number of block accesses is only a small fraction (e.g. 1/B) of their main memory complexity. However, to the best of our knowledge, no such reduction in complexity is known for any sublinear-time algorithm. One plausible explanation is that the vast majority of sublinear-time algorithms use random sampling and thus exhibit no locality of reference. This state of affairs is quite unfortunate, since both sublinear-time algorithms and the external memory model are important approaches to dealing with massive data sets, and ideally they should be combined to achieve best performance. In this paper we show that such combination is indeed possible. In particular, we consider three wellstudied problems: testing of distinctness, uniformity and identity of an empirical distribution induced by data. For these problems we show random-sampling-based algorithms whose number of block accesses is up to a factor of 1/ √ B smaller than the main memory complexity of those problems. We also show that this improvement is optimal for those problems. Since these problems are natural primitives for a number of sampling-based algorithms for other problems, our tools improve the external memory complexity of other problems as well.
منابع مشابه
Imaging: a Laboratory Manual, by Rafael Yuste, Editor Publisher Society of Photo-optical Instrumentation Engineers Accessed Terms of Use Detailed Terms
Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters.
متن کاملEthical Dilemmas in Sampling
This paper focuses on sampling as a nexus of ethical dilemmas experienced by social workers and other applied empirical researchers. It is argued here that social workers and other applied researchers have an ethical obligation to construct the smallest representative samples possible. Although random sampling is considered by many researchers as the gold standard methodological procedure for m...
متن کاملImportance Sampling for Families of Distributions
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your perso...
متن کاملSampling and Estimation in Hidden Populations Using Respondent-Driven Sampling
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your perso...
متن کاملMalignant Melanoma of the External Auditory Canal:A Rare Entity
Introduction: Although malignant melanomas (MM) are common in the head and neck region; primary malignant melanoma of the external auditory canal (EAC) is rare. Case Report: We present the case of a 50-year-old symptomatic man with a malignant melanoma of the external auditory canal, which clinically masqueraded as a haemangioma. The patient subsequently developed extensive loco-regional me...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009